首页> 外文OA文献 >Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
【2h】

Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System

机译:使用pCa网络和串联LsTm的视觉语音识别   Gmm-Hmm系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Automatic visual speech recognition is an interesting problem in patternrecognition especially when audio data is noisy or not readily available. It isalso a very challenging task mainly because of the lower amount of informationin the visual articulations compared to the audible utterance. In this work,principle component analysis is applied to the image patches - extracted fromthe video data - to learn the weights of a two-stage convolutional network.Block histograms are then extracted as the unsupervised learning features.These features are employed to learn a recurrent neural network with a set oflong short-term memory cells to obtain spatiotemporal features. Finally, theobtained features are used in a tandem GMM-HMM system for speech recognition.Our results show that the proposed method has outperformed the baselinetechniques applied to the OuluVS2 audiovisual database for phrase recognitionwith the frontal view cross-validation and testing sentence correctnessreaching 79% and 73%, respectively, as compared to the baseline of 74% oncross-validation.
机译:自动视觉语音识别是模式识别中一个有趣的问题,尤其是在音频数据嘈杂或不易获得时。这也是一项非常具有挑战性的任务,主要是因为与可听话语相比,视觉发音中的信息量较少。在这项工作中,将原理成分分析应用于从视频数据中提取的图像块,以学习两阶段卷积网络的权重,然后提取块直方图作为无监督学习特征,这些特征被用于学习递归具有一组长期短期记忆细胞的神经网络以获得时空特征。最后,将获得的特征用于串联GMM-HMM系统中的语音识别。我们的结果表明,该方法优于应用于OuluVS2视听数据库的用于短语识别的基线技术,其正面视图交叉验证和测试句子正确性达到79%,并且相对于基线的74%交叉验证,分别为73%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号